Non-probabilistic gradient method takes exponential time to get out of the saddle point.
from Suzuki, Daiji - Mathematics of Deep Learning
Non-probabilistic gradient method takes exponential time to get out of the saddle point.
https://gyazo.com/058618441879943ddea251d5f8cdd105
The use of SGD was not for this purpose, but for the performance of the computer, and as a result, I found out later that I had unexpectedly chosen the "good way" of doing things.
---
This page is auto-translated from /nishio/非確率的勾配法は鞍点から出るのに指数時間かかる using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.